Probabilistic Modeling of Distributed Information Retrieval
نویسنده
چکیده
This paper describes a model for optimum information retrieval over a distributed document collection. The model stems from Robertson's Probability Ranking Principle: Having computed individual document rankings correlated to diierent subcollections, these local rankings are stepwise merged into a nal ranking list where the documents are ordered according to their probability of relevance. Here, a full dissemination of subcollection-wide information is not required. The documents of diierent subcollec-tions are assumed to be indexed using diierent indexing vocabularies. Moreover, local rankings may be computed by individual probabilistic retrieval methods. The underlying data volume is arbitrarily scalable. A criterion for eeectively limiting the ranking process to a subset of subcollections extends the model.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملTowards an Information Retrieval Theory of Everything
I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google’s page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving...
متن کاملLanguage Modeling Approaches to Information Retrieval
This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal ...
متن کاملDistributed IR for Digital Libraries
This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic r...
متن کامل